Attended End-to-end Architecture for Age Estimation from Facial Expression Videos

نویسندگان

Wenjie Pei

Hamdi Dibeklioglu

Tadas Baltrusaitis

David M. J. Tax

چکیده

The main challenges of age estimation from facial expression videos lie not only in the modeling of the static facial appearance, but also in the capturing of the temporal facial dynamics. Traditional techniques to this problem focus on constructing handcrafted features to explore the discriminative information contained in facial appearance and dynamics separately. This relies on sophisticated feature-refinement and framework-design. In this paper, we present an end-to-end architecture for age estimation which is able to simultaneously learn both the appearance and dynamics of age from raw videos of facial expressions. Specifically, we employ convolutional neural networks to extract effective latent appearance representations and feed them into recurrent networks to model the temporal dynamics. More importantly, we propose to leverage attention models for salience detection in both the spatial domain for each single image and the temporal domain for the whole video as well. We design a specific spatially-indexed attention mechanism among the convolutional layers to extract the salient facial regions in each individual image, and a temporal attention layer to assign attention weights to each frame. This two-pronged approach not only improves the performance by allowing the model to focus on informative frames and facial areas, but it also offers an interpretable correspondence between the spatial facial regions as well as temporal frames, and the task of age estimation. We demonstrate the strong performance of our model in experiments on a large, gender-balanced database with 400 subjects with ages spanning from 8 to 76 years. Experiments reveal that our model exhibits significant superiority over the state-of-the-art methods given sufficient training data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Videos as Global Networks in the Practice of Migration (An Iranian Case Study)

Network society is an ever-changing robust system expanding new nods as long as they can communicate. Videos, as a source of information and communication, are one of the most strategic nods in this architecture. The present study is a scholarly attempt in investigating the effects of videos on facilitating the process of migration for the Iranian students. To this end, our case studies partici...

متن کامل

Learning to Extract Motion from Videos in Convolutional Neural Networks

This paper shows how to extract dense optical flow from videos with a convolutional neural network (CNN). The proposed model constitutes a potential building block for deeper architectures to allow using motion without resorting to an external algorithm, e.g. for recognition in videos. We derive our network architecture from signal processing principles to provide desired invariances to image c...

متن کامل

A new classification method based on pairwise SVM for facial age estimation

This paper presents a practical algorithm for facial age estimation from frontal face image. Facial age estimation generally comprises two key steps including age image representation and age estimation. The anthropometric model used in this study includes computation of eighteen craniofacial ratios and a new accurate skin wrinkles analysis in the first step and a pairwise binary support vector...

متن کامل

Deep Regression Forests for Age Estimation

Age estimation from facial images is typically cast as a nonlinear regression problem. The main challenge of this problem is the facial feature space w.r.t. ages is heterogeneous, due to the large variation in facial appearance across different persons of the same age and the nonstationary property of aging patterns. In this paper, we propose Deep Regression Forests (DRFs), an end-to-end model,...

متن کامل

Automatic Human Age Estimation System for Face Images

INTRODUCTION: With the development of smart devices, such as smart phones and smart televisions, natural user interfaces (NUIs) become increasingly attractive. In addition, with the vigorous research on three-dimensional (3D) video processing techniques on 3DTV, 3DTV NUIs can be also considered. NUIs offer the advantage of natural interaction with a system using predefined actions and/or physic...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1711.08690 شماره

صفحات -

تاریخ انتشار 2017

Attended End-to-end Architecture for Age Estimation from Facial Expression Videos

نویسندگان

چکیده

منابع مشابه

Videos as Global Networks in the Practice of Migration (An Iranian Case Study)

Learning to Extract Motion from Videos in Convolutional Neural Networks

A new classification method based on pairwise SVM for facial age estimation

Deep Regression Forests for Age Estimation

Automatic Human Age Estimation System for Face Images

عنوان ژورنال:

اشتراک گذاری